Search CORE

24 research outputs found

EDBL: a General Lexical Basis for the Automatic Processing of Basque

Author: Aldezabal Izaskun
Ansa Olatz,
Arrieta Bertol,
Artola Xabier,
Ezeiza Aitzol,
Hernandez G.
Lersundi Mikel,
Publication venue: IRCS Workshop on linguistic databases.
Publication date: 22/06/2006
Field of study

EDBL (Euskararen Datu-Base Lexikala) is a general-purpose lexical database used in Basque text-processing tasks. It is a large repository of lexical knowledge (currently around 80,000 entries) that acts as basis and support in a number of different NLP tasks, thus providing lexical information for several language tools: morphological analysis, spell checking and correction, lemmatization and tagging, syntactic analysis, and so on. It has been designed to be neutral in relation to the different linguistic formalisms, and flexible and open enough to accept new types of information. A browser-based user interface makes the job of consulting the database, correcting and updating entries, adding new ones, etc. easy to the lexicographer. The paper presents the conceptual schema and the main features of the database, along with some problems encountered in its design and implementation in a commercial DBMS. Given the diversity of the lexical entities and the complex relationships existing among them, three total specializations have been defined under the main class of the hierarchy that represents the conceptual schema. The first one divides all the entries in EDBL into Basque standard and non-standard entries. The second divides the units in the database into dictionary entries (classified into the different parts-of-speech) and other entries (mainly non-independent morphemes and irregularly inflected forms). Finally, another total specialization has been established between single-word entries and multiword lexical units; this permits us to describe the morphotactics of single-word entries, and the constitution and surface realization schemas of multiword lexical units.A hierarchy of typed feature structures (FS) has been designed to map the entities and relationships in the database conceptual schema. The FSs are coded in TEI-conformant SGML, and Feature Structure Declarations (FSD) have been made for all the types of the hierarchy. Feature structures are used as a delivery format to export the lexical information from the database. The information coded in this way is subsequently used as input by the different language analysis tools

ArtXiker - @HAL

HAL Descartes

Hal-Diderot

Recommended from our members

Multilingual audio information management system based on semantic knowledge in complex environments

Author: Barroso Nora
Calvo Pilar M
Ezeiza Aitzol
Fernández Elsa
Hernandez Carmen
Lopez-de-Ipina Karmele
Susperregi Unai
Publication venue: Neural Computing and Applications
Publication date: 03/12/2020
Field of study

AbstractThis paper proposes a multilingual audio information management system based on semantic knowledge in complex environments. The complex environment is defined by the limited resources (financial, material, human, and audio resources); the poor quality of the audio signal taken from an internet radio channel; the multilingual context (Spanish, French, and Basque that is in under-resourced situation in some areas); and the regular appearance of cross-lingual elements between the three languages. In addition to this, the system is also constrained by the requirements of the local multilingual industrial sector. We present the first evolutionary system based on a scalable architecture that is able to fulfill these specifications with automatic adaptation based on automatic semantic speech recognition, folksonomies, automatic configuration selection, machine learning, neural computing methodologies, and collaborative networks. As a result, it can be said that the initial goals have been accomplished and the usability of the final application has been tested successfully, even with non-experienced users.</jats:p

Apollo (Cambridge)

Recommended from our members

Multilingual audio information management system based on semantic knowledge in complex environments

Author: Barroso Nora
Calvo Pilar M
Ezeiza Aitzol
Fernández Elsa
Hernandez Carmen
Lopez-de-Ipina Karmele
Susperregi Unai
Publication venue: 'Organisation for Economic Co-Operation and Development (OECD)'
Publication date: 02/02/2021
Field of study

Apollo (Cambridge)

Alzheimer Disease Diagnosis based on Automatic Spontaneous Speech Analysis

Author: Alonso Jesús B.
Barroso Nora
Ecay-Torres Miriam
Estanga A.
Ezeiza Aitzol
Faundez-Zanuy Marcos
Lopez-de-Ipiña Karmele
Solé-Casals Jordi
Travieso Carlos M.
Publication venue
Publication date: 01/01/2012
Field of study

Alzheimer’s disease (AD) is the most prevalent form of progressive degenerative dementia and it has a high socio-economic impact in Western countries, therefore is one of the most active research areas today. Its diagnosis is sometimes made by excluding other dementias, and definitive confirmation must be done trough a post-mortem study of the brain tissue of the patient. The purpose of this paper is to contribute to improvement of early diagnosis of AD and its degree of severity, from an automatic analysis performed by non-invasive intelligent methods. The methods selected in this case are Automatic Spontaneous Speech Analysis (ASSA) and Emotional Temperature (ET), that have the great advantage of being non invasive, low cost and without any side effects

UPCommons. Portal del coneixement obert de la UPC

RIUVic

Multilingual audio information management system based on semantic knowledge in complex environments

Author: Barroso Moreno Nora
Calvo Salomón Pilar María
Ezeiza Ramos Aitzol
Fernández Gómez de Segura Elsa
Hernández Gómez María del Carmen
López de Ipiña Peña Miren Karmele
Susperregui Aseguinolaza Unai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

This paper proposes a multilingual audio information management system based on semantic knowledge in complex environments. The complex environment is defined by the limited resources (financial, material, human, and audio resources); the poor quality of the audio signal taken from an internet radio channel; the multilingual context (Spanish, French, and Basque that is in under-resourced situation in some areas); and the regular appearance of cross-lingual elements between the three languages. In addition to this, the system is also constrained by the requirements of the local multilingual industrial sector. We present the first evolutionary system based on a scalable architecture that is able to fulfill these specifications with automatic adaptation based on automatic semantic speech recognition, folksonomies, automatic configuration selection, machine learning, neural computing methodologies, and collaborative networks. As a result, it can be said that the initial goals have been accomplished and the usability of the final application has been tested successfully, even with non-experienced users.This work is being funded by Grants: TEC201677791-C4 from Plan Nacional de I + D + i, Ministry of Economic Affairs and Competitiveness of Spain and from the DomusVi Foundation Kms para recorder, the Basque Government (ELKARTEK KK-2018/00114, GEJ IT1189-19, the Government of Gipuzkoa (DG18/14 DG17/16), UPV/EHU (GIU19/090), COST ACTION (CA18106, CA15225)

Archivo Digital para la Docencia y la Investigación

TweetLID : a benchmark for tweet language identification

Author: A Xafopoulos
Aitzol Ezeiza
Arkaitz Zubiaga
C Myers-Scotton
E Baykan
F Jelinek
F Sebastiani
Iñaki Alegria
Iñaki San Vicente
JC Paolillo
José Ramom Pichel
KN Murthy
L Derczynski
M Cárdenas-Claros
M Lui
M Padró
Nora Aranberri
P McNamee
Pablo Gamallo
RD Brown
RD Brown
S Carter
Víctor Fresno
WB Cavnar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Language identification, as the task of determining the language a given text is written in, has progressed substantially in recent decades. However, three main issues remain still unresolved: (1) distinction of similar languages, (2) detection of multilingualism in a single document, and (3) identifying the language of short texts. In this paper, we describe our work on the development of a benchmark to encourage further research in these three directions, set forth an evaluation framework suitable for the task, and make a dataset of annotated tweets publicly available for research purposes. We also describe the shared task we organized to validate and assess the evaluation framework and dataset with systems submitted by seven different participants, and analyze the performance of these systems. The evaluation of the results submitted by the participants of the shared task helped us shed some light on the shortcomings of state-of-the-art language identification systems, and gives insight into the extent to which the brevity, multilingualism, and language similarity found in texts exacerbate the performance of language identifiers. Our dataset with nearly 35,000 tweets and the evaluation framework provide researchers and practitioners with suitable resources to further study the aforementioned issues on language identification within a common setting that enables to compare results with one another

Crossref

Warwick Research Archives Portal Repository

Queen Mary Research Online

ZMC 211-3 - KAEDAH MATEMATIK II MAC-APRIL 1989.pdf

Author: Alonso Jesús B.
Barroso Nora
Ecay-Torres Miriam
Egiraun Harkaitz
Ezeiza Aitzol
Faundez-Zanuy Marcos
Lopez-de-Ipiña Karmele
Martinez de Lizardui Unai
Martinez-Lage Pablo
Solé-Casals Jordi
Travieso Carlos M.
Publication venue
Publication date: 01/04/1989
Field of study

The work presented here is part of a larger study to identify novel technologies and biomarkers for early Alzheimer disease (AD) detection and it focuses on evaluating the suitability of a new approach for early AD diagnosis by non-invasive methods. The purpose is to examine in a pilot study the potential of applying intelligent algorithms to speech features obtained from suspected patients in order to contribute to the improvement of diagnosis of AD and its degree of severity. In this sense, Artificial Neural Networks (ANN) have been used for the automatic classification of the two classes (AD and control subjects). Two human issues have been analyzed for feature selection: Spontaneous Speech and Emotional Response. Not only linear features but also non-linear ones, such as Fractal Dimension, have been explored. The approach is non invasive, low cost and without any side effects. Obtained experimental results were very satisfactory and promising for early diagnosis and classification of AD patients

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Archivo Digital para la Docencia y la Investigación

Repository@USM

RIUVic

Feature selection for spontaneous speech analysis to aid in Alzheimer’s disease diagnosis: A fractal dimension approach

Author: AA
Abásolo
Aitzol Ezeiza
American Psychiatric Association (APA)
Arias-Londoño
Blanca Beitia
C.M. Travieso
Castiglioni
Chen
Chouard
Costa
Croota
Esteller
Ezeiza
Faundez-Zanuy
Godino
Godino
Gómez
Harkaitz Eguiraun
Higuchi
J.B. Alonso
Jang
Jordi Solé-Casals
Karmele López-de-Ipiña
Katz
Langi
Li
López-de-Ipiña
López-de-Ipiña
López-de-Ipiña
Maragos
Martinez
McKhann
McKhann
Miriam Ecay-Torres
Morris
Nelwamondo
Nora Barroso
Ouayoun
Pablo Martinez-Lage
Picard
Pickover
Pitsikalis
Péan
Péan
Solé
Tsonis
Van de Pole
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Alzheimer’s disease (AD) is the most prevalent form of degenerative dementia; it has a high socio-economic impact in Westerncountries. The purpose of our project is to contribute to earlier diagnosis of AD and allow better estimates of its severity by usingautomatic analysis performed through new biomarkers extracted through non-invasive intelligent methods. The method selectedis based on speech biomarkers derived from the analysis of spontaneous speech (SS). Thus the main goal of the present work isfeature search in SS, aiming at pre-clinical evaluation whose results can be used to select appropriate tests for AD diagnosis. Thefeature set employed in our earlier work offered some hopeful conclusions but failed to capture the nonlinear dynamics of speechthat are present in the speech waveforms. The extra information provided by the nonlinear features could be especially useful whentraining data is limited. In this work, the fractal dimension (FD) of the observed time series is combined with linear parameters inthe feature vector in order to enhance the performance of the original system while controlling the computational cost.© 2014 Elsevier Ltd. All rights reserved

Crossref

RIUVic

Replikantearen jazz-arima

Author: Ezeiza Aitzol
López de Ipiña Karmele
López de Ipiña Montxo
Publication venue: Elhuyar Fundazioa
Publication date: 01/01/2004
Field of study

Secretaría de Estado de Cultura